Cleaning the data

Creating new variable ‘DATE’ using date format.

Price_demand dataset
TOTALDEMAND PRICECATEGORY DATE
4179.21 LOW 2021-01-01
4047.76 LOW 2021-01-01

Assigning dummy for PRICECATEGORY.

Price_demand dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme
2021-01-01 185853.4 47 0 0 0
2021-01-02 197990.1 48 0 0 0

Creating new variables for seasons. Creating new dummyh variable ‘weekday’.

Price_demand_b dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter
2021-01-01 185853.4 47 0 0 0 1 1 0 1 0 0
2021-01-02 197990.1 48 0 0 0 0 1 0 1 0 0

Merging price_demand data with weather data.

merged_df dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter Minimum temperature (°C) Maximum temperature (°C) Rainfall (mm) Evaporation (mm) Sunshine (hours) Direction of maximum wind gust Speed of maximum wind gust (km/h) Time of maximum wind gust 9am Temperature (°C) 9am relative humidity (%) 9am cloud amount (oktas) 9am wind direction 9am wind speed (km/h) 9am MSL pressure (hPa) 3pm Temperature (°C) 3pm relative humidity (%) 3pm cloud amount (oktas) 3pm wind direction 3pm wind speed (km/h) 3pm MSL pressure (hPa)
2021-01-01 185853.4 47 0 0 0 1 1 0 1 0 0 15.6 29.9 0 2.8 9.3 NNE 31 13:14:00 19.2 77 6 N 2 1018.8 28.1 43 5 E 13 1015.3
2021-01-02 197990.1 48 0 0 0 0 1 0 1 0 0 18.4 29.0 0 9.4 1.3 NNW 30 08:22:00 23.3 52 7 NNW 17 1013.3 28.7 38 7 SW 4 1008.5

Removing variables that are difficult to assign dummy value. For example, 3 pm wind direction may result in 10 new variables if we transform it, which is no a good idea.

merged_df1 dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter Minimum temperature (°C) Maximum temperature (°C) Rainfall (mm) Evaporation (mm) Sunshine (hours) Speed of maximum wind gust (km/h) 9am Temperature (°C) 9am relative humidity (%) 9am cloud amount (oktas) 9am MSL pressure (hPa) 3pm Temperature (°C) 3pm relative humidity (%) 3pm cloud amount (oktas) 3pm MSL pressure (hPa)
2021-01-01 185853.4 47 0 0 0 1 1 0 1 0 0 15.6 29.9 0 2.8 9.3 31 19.2 77 6 1018.8 28.1 43 5 1015.3
2021-01-02 197990.1 48 0 0 0 0 1 0 1 0 0 18.4 29.0 0 9.4 1.3 30 23.3 52 7 1013.3 28.7 38 7 1008.5

Renaming some variables.

merged_df2 dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter min_temp_c max_temp_c rainfall evaporation_mm sunshine_hours speed_max_wind_gust temp_9am_c rh_9am_perc 9am cloud amount (oktas) nine_am_pressure temp_3pm_c rh_3pm_perc cloud_3pm_oktas three_pm_pressure
2021-01-01 185853.4 47 0 0 0 1 1 0 1 0 0 15.6 29.9 0 2.8 9.3 31 19.2 77 6 1018.8 28.1 43 5 1015.3
2021-01-02 197990.1 48 0 0 0 0 1 0 1 0 0 18.4 29.0 0 9.4 1.3 30 23.3 52 7 1013.3 28.7 38 7 1008.5

A line chart to display the trend of energy use overtime.

Creating pair-wise plot to display the correlation and scatterplot between variables.

Creating new plots to better indicate the relationship between Daily_use and predictors. We added trasnsparancy on the scatterplot in order to fix overlapping problem.

correlation_matrix
Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter min_temp_c max_temp_c rainfall evaporation_mm sunshine_hours speed_max_wind_gust temp_9am_c rh_9am_perc 9am cloud amount (oktas) nine_am_pressure temp_3pm_c rh_3pm_perc cloud_3pm_oktas three_pm_pressure
Daily_use 1.0000000 -0.6486831 0.5446934 0.5601229 0.3414465 0.4547346 0.5222037 NA -0.3374950 -0.2390443 0.5409376 -0.4614776 -0.3506542 -0.0603322 -0.2756115 -0.3086694 0.0873061 -0.3900082 0.1657039 0.0020121 -0.0161070 -0.3694518 0.1935807 0.1780545 -0.0535781
number_of_low -0.6486831 1.0000000 -0.7543944 -0.9070916 -0.5038979 -0.0715533 -0.6609555 NA 0.5404847 0.1398841 -0.6217476 0.6542296 0.6041825 -0.0129612 0.5824422 0.3403338 0.2385755 0.6362787 -0.3847040 0.0334731 -0.2282225 0.5994938 -0.2834451 -0.1137308 -0.2195534
number_of_high 0.5446934 -0.7543944 1.0000000 0.4247106 0.6751180 0.1065513 0.3945850 NA -0.3202322 -0.1654139 0.4514594 -0.4823317 -0.4393110 -0.0284224 -0.4097627 -0.2305303 -0.2368901 -0.4852407 0.3005811 -0.0068354 0.1222115 -0.4331732 0.2165947 0.0601918 0.1176180
number_of_medium 0.5601229 -0.9070916 0.4247106 1.0000000 0.1730938 0.0408755 0.6666045 NA -0.5462917 -0.1030931 0.5898805 -0.5994293 -0.5598558 0.0262214 -0.5423530 -0.3257323 -0.1574252 -0.5715348 0.3386211 -0.0509126 0.2244703 -0.5577139 0.2575888 0.1261782 0.2128493
number_of_extreme 0.3414465 -0.5038979 0.6751180 0.1730938 1.0000000 0.0315793 0.2409673 NA -0.1977857 -0.0468168 0.2231217 -0.3269125 -0.2860957 0.0247878 -0.2908059 -0.1487062 -0.2846035 -0.3380818 0.2208182 0.0436172 0.1447757 -0.2793110 0.1351115 0.0009159 0.1554132
weekday 0.4547346 -0.0715533 0.1065513 0.0408755 0.0315793 1.0000000 -0.0140053 NA -0.0077818 0.0212844 -0.0145007 0.0070838 -0.0280319 -0.0748971 -0.0443668 -0.0815154 0.0759442 0.0147757 0.0065947 0.1287655 0.0063398 -0.0072436 0.0189138 0.0587115 -0.0241657
month 0.5222037 -0.6609555 0.3945850 0.6666045 0.2409673 -0.0140053 1.0000000 NA -0.7501234 -0.1743774 0.8430113 -0.7551763 -0.6938343 -0.0284618 -0.5512772 -0.2746479 0.1442636 -0.7309199 0.2268789 -0.1191908 0.1070607 -0.6745757 0.1264899 0.0974287 0.0815570
spring NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
summer -0.3374950 0.5404847 -0.3202322 -0.5462917 -0.1977857 -0.0077818 -0.7501234 NA 1.0000000 -0.4493986 -0.4374573 0.6145650 0.5817236 0.0048614 0.5792860 0.3251612 0.0363461 0.6165902 -0.2911332 0.1053383 -0.2254264 0.5683194 -0.1797625 -0.1029104 -0.2200678
autumn -0.2390443 0.1398841 -0.1654139 -0.1030931 -0.0468168 0.0212844 -0.1743774 NA -0.4493986 1.0000000 -0.6067258 0.0893810 0.0801467 0.0254932 -0.0827573 -0.0686916 -0.2472186 0.0718794 0.0296371 -0.0457864 0.2650694 0.0814422 -0.0138200 0.0040628 0.2867537
winter 0.5409376 -0.6217476 0.4514594 0.5898805 0.2231217 -0.0145007 0.8430113 NA -0.4374573 -0.6067258 1.0000000 -0.6368293 -0.5983106 -0.0299877 -0.4321602 -0.2201917 0.2165118 -0.6210139 0.2292254 -0.0476437 -0.0662317 -0.5876873 0.1738691 0.0874828 -0.0928275
min_temp_c -0.4614776 0.6542296 -0.4823317 -0.5994293 -0.3269125 0.0070838 -0.7551763 NA 0.6145650 0.0893810 -0.6368293 1.0000000 0.7038737 0.0392659 0.6580855 0.1010270 0.0234143 0.9117839 -0.3205211 0.1945832 -0.3140257 0.6592271 -0.0590947 0.0537684 -0.2818428
max_temp_c -0.3506542 0.6041825 -0.4393110 -0.5598558 -0.2860957 -0.0280319 -0.6938343 NA 0.5817236 0.0801467 -0.5983106 0.7038737 1.0000000 -0.1291405 0.6176569 0.4830470 -0.0764549 0.8203638 -0.3024219 -0.1818626 -0.0906471 0.9676495 -0.4635275 -0.2473333 -0.1750171
rainfall -0.0603322 -0.0129612 -0.0284224 0.0262214 0.0247878 -0.0748971 -0.0284618 NA 0.0048614 0.0254932 -0.0299877 0.0392659 -0.1291405 1.0000000 -0.0475968 -0.1479469 0.0443535 -0.0279934 0.1974026 0.0689167 -0.1353564 -0.1311749 0.1926251 0.1130831 -0.0560418
evaporation_mm -0.2756115 0.5824422 -0.4097627 -0.5423530 -0.2908059 -0.0443668 -0.5512772 NA 0.5792860 -0.0827573 -0.4321602 0.6580855 0.6176569 -0.0475968 1.0000000 0.2872632 0.1443648 0.7085524 -0.5248660 0.0008664 -0.2658679 0.5537847 -0.2093419 -0.1033393 -0.2486408
sunshine_hours -0.3086694 0.3403338 -0.2305303 -0.3257323 -0.1487062 -0.0815154 -0.2746479 NA 0.3251612 -0.0686916 -0.2201917 0.1010270 0.4830470 -0.1479469 0.2872632 1.0000000 -0.0407631 0.2196623 -0.2580691 -0.5914561 0.2058461 0.5071857 -0.5618426 -0.7233729 0.1345751
speed_max_wind_gust 0.0873061 0.2385755 -0.2368901 -0.1574252 -0.2846035 0.0759442 0.1442636 NA 0.0363461 -0.2472186 0.2165118 0.0234143 -0.0764549 0.0443535 0.1443648 -0.0407631 1.0000000 0.0813137 -0.3721490 0.0042275 -0.4169185 -0.1208083 -0.0748268 0.1265609 -0.4339075
temp_9am_c -0.3900082 0.6362787 -0.4852407 -0.5715348 -0.3380818 0.0147757 -0.7309199 NA 0.6165902 0.0718794 -0.6210139 0.9117839 0.8203638 -0.0279934 0.7085524 0.2196623 0.0813137 1.0000000 -0.4598704 0.0531181 -0.2692844 0.7632658 -0.1658883 -0.0130164 -0.2797938
rh_9am_perc 0.1657039 -0.3847040 0.3005811 0.3386211 0.2208182 0.0065947 0.2268789 NA -0.2911332 0.0296371 0.2292254 -0.3205211 -0.3024219 0.1974026 -0.5248660 -0.2580691 -0.3721490 -0.4598704 1.0000000 0.0831009 0.1343266 -0.2541619 0.4023932 0.0428020 0.1375042
9am cloud amount (oktas) 0.0020121 0.0334731 -0.0068354 -0.0509126 0.0436172 0.1287655 -0.1191908 NA 0.1053383 -0.0457864 -0.0476437 0.1945832 -0.1818626 0.0689167 0.0008664 -0.5914561 0.0042275 0.0531181 0.0831009 1.0000000 -0.1902516 -0.1997774 0.4318312 0.3942641 -0.1203394
nine_am_pressure -0.0161070 -0.2282225 0.1222115 0.2244703 0.1447757 0.0063398 0.1070607 NA -0.2254264 0.2650694 -0.0662317 -0.3140257 -0.0906471 -0.1353564 -0.2658679 0.2058461 -0.4169185 -0.2692844 0.1343266 -0.1902516 1.0000000 -0.0281510 -0.1385351 -0.2942258 0.9611733
temp_3pm_c -0.3694518 0.5994938 -0.4331732 -0.5577139 -0.2793110 -0.0072436 -0.6745757 NA 0.5683194 0.0814422 -0.5876873 0.6592271 0.9676495 -0.1311749 0.5537847 0.5071857 -0.1208083 0.7632658 -0.2541619 -0.1997774 -0.0281510 1.0000000 -0.5527841 -0.2830432 -0.1245512
rh_3pm_perc 0.1935807 -0.2834451 0.2165947 0.2575888 0.1351115 0.0189138 0.1264899 NA -0.1797625 -0.0138200 0.1738691 -0.0590947 -0.4635275 0.1926251 -0.2093419 -0.5618426 -0.0748268 -0.1658883 0.4023932 0.4318312 -0.1385351 -0.5527841 1.0000000 0.3538531 -0.0306435
cloud_3pm_oktas 0.1780545 -0.1137308 0.0601918 0.1261782 0.0009159 0.0587115 0.0974287 NA -0.1029104 0.0040628 0.0874828 0.0537684 -0.2473333 0.1130831 -0.1033393 -0.7233729 0.1265609 -0.0130164 0.0428020 0.3942641 -0.2942258 -0.2830432 0.3538531 1.0000000 -0.2303897
three_pm_pressure -0.0535781 -0.2195534 0.1176180 0.2128493 0.1554132 -0.0241657 0.0815570 NA -0.2200678 0.2867537 -0.0928275 -0.2818428 -0.1750171 -0.0560418 -0.2486408 0.1345751 -0.4339075 -0.2797938 0.1375042 -0.1203394 0.9611733 -0.1245512 -0.0306435 -0.2303897 1.0000000

Creating new data set by adding new transformed variables.

merged_df6 dataset
DATE Daily_use number_of_low number_of_high number_of_medium number_of_extreme weekday month spring summer autumn winter min_temp_c max_temp_c rainfall evaporation_mm sunshine_hours speed_max_wind_gust temp_9am_c rh_9am_perc 9am cloud amount (oktas) nine_am_pressure temp_3pm_c rh_3pm_perc cloud_3pm_oktas three_pm_pressure min_temp_c2 max_temp_c2 temp_9am_c2 temp_3pm_c2 sunshine_hours2 min_temp_c3 max_temp_c3 temp_9am_c3 temp_3pm_c3 sunshine_hours3
2021-01-01 185853.4 47 0 0 0 1 1 0 1 0 0 15.6 29.9 0 2.8 9.3 31 19.2 77 6 1018.8 28.1 43 5 1015.3 243.36 894.01 368.64 789.61 86.49 3796.416 26730.9 7077.888 22188.04 804.357
2021-01-02 197990.1 48 0 0 0 0 1 0 1 0 0 18.4 29.0 0 9.4 1.3 30 23.3 52 7 1013.3 28.7 38 7 1008.5 338.56 841.00 542.89 823.69 1.69 6229.504 24389.0 12649.337 23639.90 2.197